Classification of endogenous and exogenous bursts in collective emotions based on Weibo comments during COVID-19

Bursts and collective emotion have been widely studied in social physics field where researchers use mathematical models to understand human social dynamics. However, few researches recognize and separately analyze the internal and external influence on burst behaviors. To bridge this gap, we introduce a non-parametric approach to classify an interevent time series into five scenarios: random arrival, endogenous burst, endogenous non-burst, exogenous burst and exogenous non-burst. In order to process large-scale social media data, we first segment the interevent time series into sections by detecting change points. Then we use the rule-based algorithm to classify the time series based on its distribution. To validate our model, we analyze 27.2 million COVID-19 related comments collected from Chinese social media between January to October 2020. We adopt the emotion category called Profile of Mood States which consists of six emotions: Anger, Depression, Fatigue, Vigor, Tension and Confusion. This enables us to compare the burst features of different collective emotions during the COVID-19 period. The burst detection and classification approach introduced in this paper can also be applied to analyzing other complex systems, including but not limited to social media, financial market and signal processing.


Results
In the Results section, we will explain the concept of a non-parametric approach to classify endogenous and exogenous bursts based on the emotion time series and show the result of empirical data analysis. Details of the methods will be provided in the "Methods" section. Figure 1 summarizes the data analysis process. It can also be used as a directory for reading this paper.
Public emotion profile based on Weibo comments related to COVID-19. We first visualize the public emotion profile (see Fig. 2) to give a general idea of how users in the Weibo platform responded to the COVID-19 crisis during January to October 2020.
We extract the POMS emotions from Weibo comments and analyze the daily count of comments under each emotion. Briefly, the emotion extraction method can be described as follows. We build a Chinese version of POMS dictionary containing 3944 emotional words categorized into 6 POMS emotions and 2500 neutral words  www.nature.com/scientificreports/ When comparing emotions before and during the COVID-19, we can observe that Anger remained at the similar level and not much affected by the COVID-19 situation. Depression rose slightly in early outbreak period from January to April 2020 and fell back to the baseline level afterwards. Fatigue also rose during the COVID-19 compared to the baseline and remained at a higher level from January to October. Vigor decreased since the COVID-19 outbreak. Although it rose a bit after April 2020, the level was still lower than the baseline. Tension surged at the point when the COVID-19 was firstly confirmed by the government, but it immediately dropped despite the worsening COVID-19 situation. This may be explained by a phenomenon called psychological resilience 34 . When people are faced with a stressful situation, they tend to develop positive emotions with a moderate tension level. The overall Tension level during the COVID-19 was higher than the baseline. Confusion increased during the COVID-19 period compared to the baseline and it remained steady throughout the observation period.
Non-parametric approach to identify endogenous and exogenous bursts from long-term time series. In this paper, we develop a non-parametric approach to reduce computation complexity. The method consists of two steps: (1) segment the time series into homogenous sections and (2) use hypothesis test to check if a time series section belongs to endogenous, exogenous burst or other arrival patterns.
Here in the Results section, we describe the concept and show the results of the non-parametric approach. The details are provided in the "Methods" section-4. Non-parametric rule-based algorithm to classify the time series sections.
(1) Segment the long-term time series into homogenous sections Because the user behavior may change over time which results in different distributions, we need to segment the long-term time series before doing the burst analysis. We introduce a new segmentation method applied to the comment count time series. We first aggregate the time series per 600 s to mitigate zero-valued data points while keeping the fluctuating pattern in the time series. However, the six emotion time series spanning over 10 months still contains more than 40,000 data points each, which is large in size. Therefore, a non-parametric segmentation method is preferred. We adopt a method based on the Fisher's Exact Test proposed by Sato and Takayasu 21 .
Conceptually, the Fisher's Exact Test functions as follows. We start with the single point detection. If there exists one change point υ in a time series { r t }, then the time series segments before and after υ should be inhomogeneous, which can be tested using the Fisher's Exact Test (the lower the p-value is, the less homogenous are the two datasets). Therefore, to look for a change point in a time series, we calculate the hypergeometric probability for all the points in a time series and take the minimum hypergeometric probability as the p-value. If this p-value is lower than the pre-determined threshold, then we adopt the corresponding time point t as the change point υ . Otherwise, we conclude that there is no change point in the given time series.
To extend this method to multiple change points detection, we simply need to repeat the same process recursively on the time series segments, until all segments' p-values are higher than the threshold, which indicates that there exists no more change point.
Here we show an example of the time series segmentation result (see Fig. 3). The details of segmentation method are provided in "Methods"-3. Segmentation of time series: multiple change point detection. After segmentation, the original time series for each emotion is segmented into different number of sections: Anger-651 sections, Depression-558 sections, Fatigue-208 sections, Vigor-659 sections, Tension-464 sections, Confusion-390 sections. The average section length is 14.3 h. The minimum section length is 0.5 h. The maximum section length is 55 days, due to very few comments related to Fatigue were posted from August to October 2020.
(2) Classify a time series section to detect endogenous and exogenous burst As a pre-process we exclude "inactive periods" from the time series after the segmentation procedure. Inactive periods typically appear in the midnight (2 a.m.-5 a.m.), or after May 2020 when the COVID-19 crisis was under control in China (see Fig. 3b) so there are less COVID-19 related comments. The inactive periods are characterized by a very low submission rate of comments, and therefore no statistical property can be discussed. If a time series section partly or fully falls within 2 a.m.-5 a.m. (the condition can be set as start time t start ≤ 5 a.m. and end time t end ≥ 2 a.m.), or if its average rate of comment arrival is lower than 1 comment per minute, we categorize the time series section as "inactive" and exclude it from our analysis.
Next, we define the categories which we will classify a time series segment into. When users post comments randomly and independently at a constant rate, the number of comments posted within a fixed time interval follows a Poisson process and the interevent time follows an exponential distribution. We define such scenario as scenario 1-random. When the posting behavior is neither random nor independent, the arrival of comments is clustered. This can result from an endogenous influence (such as preceding comments) or an exogenous influence (such as external news or events). When users are exposed to an endogenous influence, if the number of comments surge drastically, we define it as scenario 2-endogenous burst, elsewise as scenario 3-endogenous non-burst. Similarly, when users are exposed to an exogenous influence, if the number of comments surge drastically we define it as scenario 4-exogenous burst, elsewise as scenario 5-exogenous non-burst. In the following we will introduce the concept of classifying a time series into these five scenarios. www.nature.com/scientificreports/ Conceptually, under different scenarios the distribution of interevent time can be different. Let t j be the time point at which jth comment is posted. Therefore, t j = t j − t j−1 represents the posting time interval between jth and (j − 1)th comment.
When comments arrive randomly and independently (scenario 1-random), the number of comments within a fixed interval, r t , follows a Poisson distribution with a constant average rate 0 , r t ∼ Poi( 0 ) . The interevent time t j follows an exponential distribution �t j ∼ Exp( 0 ) . This can be tested using the Chi-square Test.
When the comment arrival is influenced by endogenous factors (scenario 2 and 3), the posting time series can be modeled as a self-modulation process where the probability of an event's occurrence is dependent on precedent events. Such process is similar to Hawkes process or self-exciting (regulating) process 20,35,36 . Inspired by Takayasu and Takayasu 22 , we presume that the interevent time t j between jth and (j − 1)th comment depends on the average interevent time of precedent comments �t j−1 , �t j−2 , . . . , �t j−k posted over past φ period, where φ is a memory kernel and k is the number of comments in the past φ period. Namely, we assume that t j can be modeled using the following Eq. (1), in which b j is an independently and identically distributed (i.i.d.) random variable which follows the exponential distribution ~ Exp(1) . In this paper, the angle brackets < > means the averaged value, namely, < �t j−1 , �t j−2 , . . . , �t j−k >= 1 where The value of memory period, φ, and the corresponding number of precedent comments, k, can be determined as follows. When there exists such dependency in a time series { t j }, the autocorrelation function ρ(τ ), especially ρ(τ = 1) � = 0 . Therefore, we can adjust the value of φ and k to calculate b j = �t j <�t j−k > based on Eq. (1) and the (1) �t j = b j < �t j−1 , �t j−2 , . . . , �t j−k >, is the corresponding comment's arrival density plot of (a). It shows that the originally uneven and clustered distribution of arrival can be segmented into sections with similar comment arrival density. (c) Results of the process of recursive Fisher's Exact Test for detecting multiple change points in a time series. The hypergeometric probability is calculated using Eq. (5) for each sub-segment. The minimum hypergeometric probability value of each segment is taken as the p-value. The p-value is then compared to a threshold p th to determine if the time point t is small enough to be considered as a change point υ . If yes, the sub-segment will be cut into 2 subsegments at the change point υ . Then the same process will be repeated until the p-value of all sub-segments are larger than the threshold. The data label CP:i refers to the change point detected in the ith loop.
Scientific Reports | (2022) 12:3120 | https://doi.org/10.1038/s41598-022-07067-w www.nature.com/scientificreports/ corresponding autocorrelation function ρ(1) based on Eq. (8). The optimal value of φ and k are the ones that remove (minimize) the autocorrelation value ρ(1) of the time series { b j }. Details are described in the "Methods" section. Figure 4a shows an example of determining the value of φ based on the autocorrelation function. If the normalized interevent time { b j } follows an exponential distribution (to be tested by the Chi-square test), it means that the original interevent time series { t j } can be modeled by Eq. (1). Therefore, it can be classified to Scenario 2 and 3-endogenous. Otherwise, if { t j } does not belong to Scenario 1, 2 or 3, we presume that the distribution of interevent time t j is influenced by exogenous factors such as external news or events.
However, not all time series classified to the endogenous or exogenous category are bursts (drastic increase). Some time series can be fluctuating or decreasing. Therefore, we need to calculate the increment rate of a time series. Only when the increment rate is larger than the threshold θ , the time series segment can be classified as a burst. To find the threshold value θ, we calculate the increment rate (see Eq. (9)) for all endogenous and exogenous time series and plot the histogram chart (Fig. 4b). We take the first local minima of the histogram chart, where the increment rate equals to 1 as the threshold value (see the red bar in Fig. 4b). Figure 4c shows an example of the classification result where the increment value of each segment is annotated in the figure. We can see that the exogenous and endogenous segments with obvious increment are classified into burst, while those fluctuating or decreasing are classified into non-burst.
The five scenarios we introduce for categorization of segments are summarized as follows: • Scenario 1: Random Interevent time follows exponential distribution t j ∼ Exp( 0 ) • Scenario 2 and 3: Endogenous Interevent time t j is dependent on the averaged preceding interevent times and Eqs. (1) and (2) are fulfilled, which will be further classified into below two scenarios: -Scenario 2: Endogenous burst -Scenario 3: Endogenous non-burst

• Scenario 4 and 5: Exogenous
The distribution of interevent time t j neither falls under scenario 1 nor scenario 2 and 3. It will be further classified into below two scenarios.
-Scenario 4: Exogenous burst -Scenario 5: Exogenous non-burst Based on the concept described above, we propose a non-parametric approach-the rule-based algorithm to categorize segments of time series based on the distribution of time intervals between consecutive comments as follows. The detailed method and algorithm are provided in the "Methods" section-4. Non-parametric rule-based algorithm to classify the time series sections. Figure 4c-f shows that our proposed non-parametric approach can segment a time series into homogenous sections and determine the type of arrival pattern.

Burst features of different emotions and periods based on the COVID-19 related comments.
Having segmented the 6 emotion time series and classified each time series section into the five scenarios, we summarize the duration of each scenario by month and emotion (see Fig. 5). We can observe that for each emotion, the burst period was longer from January to April compared to May to October. This trend corresponded to the count of COVID-19 new infection cases which peaked in February, underwent recovery from February to April and was under control since May. Therefore, collective emotion was considered to be affected by the COVID-19 situation.
Among all the emotions, Vigor, Anger, Depression had the longest burst duration which on average reached close to 50% during February to March in 2020. This indicates that these three emotions can easily form collective emotions during outbreak and recovery period of the COVID-19 in China.
For Anger and Vigor, the percentage of exogenous bursts was much higher than endogenous bursts, suggesting that Anger and Vigor were more likely triggered by external factors such as news and propaganda published by official accounts such as the governments or news presses. In contrast, Depression had a more balanced exogenous and endogenous bursts.
Tension had shorter burst periods compared to the above three emotions, but it still had more than 35% of time under burst status during February and March. Tension was more likely to be triggered by endogenous influence.
Fatigue and Confusion both had a comparatively low percentage of burst duration suggesting that they were less likely to form collective emotions. Duration of endogenous bursts was longer than that of exogenous bursts, which indicates that they were more influenced by other users' comments instead of external news.
We are interested in the dynamics of emotion in terms of how the popularity of an emotion evolve during the endogenous and exogenous burst, and how different are the dynamics between different emotions. In the following we will further analyze the features of endogenous and exogenous bursts separately.
An endogenous burst is formed when users' emotion and posting behavior are influenced by other users' comments. We can use the self-modulation process model (Eq. 1) to explain the dynamics behind the emotion cascade. Based on the model, when the time interval between comments is short (or long), the arrival of the next comment also tends to be fast (or slow). Although the endogenous burst also represents an increase in the number of comments, the change is usually more gradual than the exogenous burst. We can observe from Fig. 5c Scientific Reports | (2022) 12:3120 | https://doi.org/10.1038/s41598-022-07067-w www.nature.com/scientificreports/ www.nature.com/scientificreports/ that the increment rate of an endogenous burst is usually lower than an exogenous burst. Such self-modulation process can be normalized to a random process based on the memory period φ (Eq. 2), which suggests that the users are influenced of comments over precedent φ period. In this model, the memory period φ is the key parameter to be determined empirically from the data. We plot a frequency distribution histogram in Fig. 6. We can observe that the memory period φ for most emotions are similar except for Fatigue. For most emotions, the average memory period ranges from 150 to 188 s, which is around 3 min. However, the average memory period of Fatigue is 318 s which is slightly higher. Therefore, we can conclude that during COVID-19 crisis, the popularity of a collective emotion triggered by peer influence is related to the short-term memory.
Next, we investigate the details of exogenous bursts. An exogenous burst can be understood as a sudden increase in comment arrival rates when influenced by external news or events, whose increment rate is comparatively higher than the endogenous burst (see Fig. 5c). The burst then decays gradually and returns to the non-active level as the popularity of the news fade away. In this paper, we apply a power-law function to model the dynamics of an exogenous burst. Let r t be the number of comments posted at time point t after an external news arrives at time point t E . Let β be the power-law decay exponent, Here the power-law exponent β is the key parameter to be determined based on empirical data analysis. We apply the nonlinear least squares to fit the number of comments r t to time elapsed after the external news being posted (t − t E ) . The unit time is set as 300 s. We find that 95% of the exogenous period can be fitted nicely by a power-law function. The average R-squared values for measuring the goodness-of-fit are as follows: Anger-0.77, Depression-0.80, Tension-0.81, Vigor-0.85 (the results of Fatigue and Confusion are excluded because they rarely have exogenous burst). Figure 7a shows an example of fitting data with a power-law function. Then we plot Fig. 7b to show the frequency distribution of power law exponent β. We can observe that the value of β for different emotions are as follows: Anger-0.42, Depression-0.62, Vigor-0.63, Tension-0. 49.
The power-law decay exponent represents the persistence of collective response to an external influence 20 . Our fitting result reveals interesting difference in emotion persistence when users are exposed to exogenous news. Anger and Tension have a smaller power-law exponent which suggest that these two emotions are more persistent when exposed to external news related to COVID-19. Comparatively, Depression and Vigor has a larger powerlaw exponent which indicates that they are less persistent and fades away faster during the exogenous burst. www.nature.com/scientificreports/ There are many other researches using a power-law function to model the relaxation of bursts after an external influence, for example, Sano et al. 19 found that the burst of number of blog posts decayed after the 2011 Japan Tsunami with a power-law exponent of 0.67; Crane and Sornette 20 found that the number of views on featured YouTube videos decayed with a power-law exponent of 0.6; Johansen and Sornette 37 found that the popularity of papers relaxed obeying a power-law exponent of 0.58 after being introduced in an interview. However, none of these papers studied the difference of power-law exponent between different collective emotions. Therefore, our result shows some new findings related to the exogenous burst.
For exogenous bursts that are triggered by external news or events, we also investigate what news topics attracted users' attention. We gather comments that form the exogenous bursts and use Term Frequency Inverse Document Frequency (TF-IDF) 38 method to extract one topic word from each comment. Then we visualize 10 most frequently occurred topics (with the highest TF-IDF value) for each emotion and each month as shown in Fig. 7c.

Methods
Dataset. Weibo is a leading social platform in China which has been widely used during the COVID-19 outbreak period for information sharing and communication. The active Weibo user reached 241 million during the first quarter 2020, increasing by 15% compared to the same period in 2019. Like Twitter, it is a public platform where users can post and comment in short text. In this paper we use two datasets: 1. The main dataset in which all comments are related to COVID-19. It contains 27.2 million comments that were collected during 1st January to 31st October 2020. 2. The control dataset that is not related to COVID-19. It contains 9.2 million comments that were collected during 1st September to 31st December 2019, before the COVID-19 outbreak.
Both datasets were collected using Weibo open API (https:// open. weibo. com/ wiki/ API). More specifically, we first collected the COVID-19 related posts published by 1600 public accounts (news organizations, government, influential individuals, etc.) and then obtained comments under these posts (see Fig. 8). Note that because we want to study collective emotions rather than individual emotions, only posts with more than 20 comments are included. Detailed data collection process and profile of 1600 verified public accounts are provided in the Supplement 1.

Extract POMS emotions: emotion word classification.
There are various approaches to categorize words, sentences, or paragraphs according to a pre-defined emotion category based on the classification of emotions. The most common methods are dictionary-based approaches that look up emotion words in a pre-defined dictionary 31,32,[39][40][41][42] , rule-based approaches that define a rule of how to use available information (e.g. linguistic features, emoticons, etc.) for predicting the emotion categories 43,44 , and machine learning algorithm that uses corpus to build classification model 4,18,45,46 . In this paper, we use words and emotion categories defined in POMS questionnaire. However, the pure dictionary-based approach contains a limited range of words and therefore is not suitable for analyzing social media data which are written in informal languages such as slangs, abbreviations, or new words.
Our research adopts dictionary-based approach but expands it using machine learning algorithms (Fig. 8). We first created a Chinese version of POMS dictionary (the corpus has been provided in Supplement 2). Then we used this dictionary as labeled data for training a multi-class emotion classifier which can automatically classify a word to either neutral category or one of POMS emotion categories. To run the machine learning training, we looked up each word in a Word2Vec corpus (developed by Tencent AI lab that provides 200-dimension vector representation for 8 million Chinese words 47 ) to get a corresponding word vector. The Word2Vec is a technique that maps a word to a high dimensional vector space based on its semantic similarity to other words 48 . This feature has been used by researchers to classify emotions presuming that words with similar emotion are closer in the vector space 45,46,49 . Our training objective is to find the optimal boundaries between seven clusters: six POMS emotion clusters-Tension, Anger, Vigor, Fatigue, Depression, Confusion, and neutral cluster. We ran both Support Vector Machine (using Scikit-learn package 50 ) and Neural Network algorithm based on 6444 labeled words, 80% of which were used for training and 20% of which were used for cross validation. Figure 8a shows that both algorithms performed well, but the Support Vector Machine algorithm yielded a slightly higher accuracy rate. Therefore, we used a model trained by Support Vector Machine to classify new words.
We processed two datasets using the algorithm described in Fig. 8b and extracted emotions from comments. More details of this emotion classification method are explained in Supplement.

Segmentation of time series: multiple change point detection. The change point detection
algorithms have been widely studied and applied in fields like signal processing 51 , financial market 21,52 and climatology 53 . They are categorized into either parametric or non-parametric approaches. The parametric approach fits the time series data into one or a selection of known distribution functions, then find a segmentation point where the distribution changes. The non-parametric approach compares the homogeneity of the time series before and after a time point and tests if it is significant enough to be treated as a change point.
In this paper, we focus on the non-parametric algorithm to reduce computational complexity. There are various non-parametric algorithms developed for detecting multiple change points. Wilcoxon Rank Statistic 54 compares the homogeneity of two samples' population mean ranks and extends it to detect multiple change points Scientific Reports | (2022) 12:3120 | https://doi.org/10.1038/s41598-022-07067-w www.nature.com/scientificreports/ using dynamic programming. Maximum Likelihood Estimation 55 which treats time series as binary data uses Bayesian information criterion to find number of change points and dynamic programming to get locations of change points. Wild Binary Segmentation 56 localizes multiple change point problem by recursively running single change point detection in subsegments based on CUSUM (cumulative sum) statistics. Density Ratio Estimation 57 adopts non-parametric Gaussian kernel model to calculate the density ratio of sample data before and after each time point, then choose those points as change points if the density ratios exceed a pre-defined threshold. We adopted a non-parametric method based on Fisher's Exact Test proposed by Sato and Takayasu 21 . The advantage of Fisher's Exact Test is that the p-value is directly calculated based on statistics and therefore is more accurate for any data size. To apply this statistical test, we count the number of points a, b, c and d shown in the Table 1 contingency table, by comparing each data point r t to h and comparing its time point t to υ . h is a constant with value min(r t ) ≤ h ≤ max(r t ) . Its value can be determined by trying different h and adopting the one that generates minimum p-value based on Fisher's Exact Test. To reduce computation complexity, we divide the range [ min(r t ), max(r t ) ] evenly into ten deciles and take each decile as potential h value.
Given the contingency table denoted as X υ , we calculate the hypergeometric probability at time point υ.
Then we sum up hypergeometric probabilities p(Y) that are smaller than p(X υ ) based on contingency tables of all possible combinations of a, b, c, d that returns the same subtotals.
If the p-value is smaller than the significance level p th , we reject the null hypothesis and treat the time point υ as a change point for time series { r t }. Otherwise, we accept υ as the change point. The significance level p th is determined by shuffling the time series { r t } randomly N times and taking the minimum p-value generated from N trials as p th (in this paper we shuffled the time series for N = 1000 times). This indicates that for a random time series (no change points), the probability Pr (p -value ≤ p th ) = 1 N → 0 for large N . We repeat the same procedures recursively on time series segments to the left and right of the detected change point until all segments' p-value are higher than p th .
Empirically, it is still computational costly to run the hypothesis test on 10 months' time series even after aggregating per 600 s. To reduce the data size, we process 3 days' time series at once and then combine the last section with the following 3-day time series to maintain continuity. As a result, data size can be confined within around 500 time points per calculation. We also observe that the significance level p th tends to be larger when time series { r t } has wider range. We shuffle time series 1000 times and got p th = 10 −4 when max {r t } > 50 and set p th = 10 −6 when max {r t } ≤ 50.
Non-parametric rule-based algorithm to classify the time series sections. The rule-based algorithm consists of 4 steps.
Step (1): Check if the original interevent time { t j } belongs to the random category (if it is exponentially distributed).
We start with testing if the time series in a segment is randomly and independently distributed using Chisquare test. Let F t j be the cumulative distribution function (CDF) of real-valued interevent time t j in the segment, and F t j = exp − 0 t j be the CDF of exponential distribution that is fitted to F t j using nonlinear least squares.
The Chi-square test is defined as follows: p -value = min υ,h P(X υ ), where0 ≤ υ ≤ T and min(r t ≤) h ≤ max(r t )  www.nature.com/scientificreports/ -H 0 ( t j follows an exponential distribution):F t j = F t j -H 1 ( t j does not follow an exponential distribution): F t j = F t j .
The test statistics χ 2 can be calculated using Eq. (7).
where n represents the total number of t j data points in the time series segment. Based on χ 2 and degree of freedom (n − 1), the p-value can be calculated using the Chi-squared distribution. The p-value is then compared to the significance level 0.05 to determine if reject or accept the null hypothesis. If the p -value is larger than 0.05 , then the null hypothesis H 0 is accepted, we conclude that t j follows an exponential distribution and the arrival of comment is random not showing burst features.
Step (2): If { t j } is not randomly distributed, remove the autocorrelation from { t j } and build the normalized time series { b j }.
This step aims to find the optimal value of ϕ , the correlation period, and k, the number of precedent comments posted within period ϕ . The autocorrelation function can be calculated as follows.
where b j = �t j <�t j−k > (Eq. 1), t j is the time interval between (j − 1)th and jth comment, j ∈ [1, n] . τ is the time lag (here we set the unit lag time τ as 600 s). k is the number of comments arrived right before jth comment over the past ϕ period.
We gradually increase ϕ from 0 to up to 3600 s (assuming user's memory is shorter than 1 h), calculate corresponding normalized time interval { b j } and its autocorrelation function ρ(τ ) until we find a local minimal ρ(1) which is within distance 0.01 from the origin.
Step (3): Check if { b j } follows an exponential distribution using the Chi-square test (similar to step (1)).
Here we set Chi-square test's significance level at 0.0005 instead of 0.05 because in Eq. (1) for calculating the average time interval of comments posted within a period φ, we assume that the memory period φ is fixed for the time series segment to reduce computation complexity. As described in the 2nd step, the value of φ may be varied so we lower the significance level to mitigate the impact of this assumption on the result. If the Chi-square test's null hypothesis is accepted, meaning the normalized time intervals follow the exponential distribution, then we conclude that the comment arrival is caused by an endogenous self-modulation effect (scenario 2 or 3). Otherwise, the segment is considered as exogenous (scenario 4 or 5).
Step (4) The rule-based algorithm to classify a time series section into the five scenarios can be summarized using the following pseudocode: If increment rate of comment count ∆ ( ) ≥ threshold value θ=1: Return "endogenous burst" Else: Return "endogenous non-burst" End If Else: If increment rate of comment count ∆ ( ) ≥ threshold value θ=1: Return "exogenous burst" Else: Return "exogenous non-burst"

Conclusion and discussion
In this paper, we have proposed a novel approach to detect endogenous and exogenous emotion bursts from 27.2 million Weibo comments. We use a machine-learning algorithm to extract multi-class emotions from comment texts, segment the emotion time series into sections and then use a rule-based algorithm to identify the bursts. Our approach is non-parametric and therefore suitable for analyzing dataset of any size.
The analysis result reveals interesting differences in the burst feature between collective emotions. Vigor, Anger and Depression had significantly longer burst duration than Fatigue and Confusion especially during the COVID-19 outbreak period. Vigor and Anger bursts were more triggered by exogenous influence, while Tension, Fatigue and Confusion bursts were more triggered by endogenous influence. For the endogenous burst, we show that the word-of-mouth dynamics can be modeled by a self-modulation process during which emotions cascade based on a short-term memory period φ. The values of φ are similar for most emotions at around 3 min while Fatigue has a longer memory period of 5 min. For exogenous bursts, we show that the drastic surge of number of comments followed by relaxation can be modeled by a power-law function whose decay exponent β represents the persistence of the external influence. For emotions bursts triggered by the exogenous factors, we find that the values of β are smaller for Anger (0.42) and Tension (0.49), and larger for Depression (0.62) and Vigor (0.63), suggesting that the external influence on Anger and Tension are more persistent.
To our knowledge, the burst analysis based on multi-class collective emotions during COVID-19 is a novel research topic. We have shared the detailed empirical data analysis method to make it easily reproducible by other researchers. It can be an interesting future study to compare the burst features between different countries (such as the length of endogenous and exogenous bursts, the burst modeling parameters and key topic words), which may show differences in collective emotion response based on different COVID-19 situations, cultural backgrounds and prevention measures.
Our proposed burst detection method is not only applicable for analyzing social media data but can also be applied to analyze financial markets where each trader's behavior may be influenced by other traders (endogeneity) or external news (exogeneity). It may also be applied to analyze burst phenomena in complex systems such as computer networks and servers. For example, the internet traffic burst that threats network security and affects user experience may be caused by endogenous factors (such as uneven and clustered usage) or external factors (cyber-attack or other extraordinary events). Our method can help to detect abnormality, identify the root causes of the bursts, and improve system performance.
The limitations of this paper are as follows. Firstly, due to the limited API usage and access provided by Weibo, we only obtained comments under posts published by 1600 official accounts. Compared to the large user population in Weibo, our results may not represent the whole user group. By using the comment data, we may constrain the topics being discussed among the users by the contents published by the official accounts. Ideally, our method could be better applied to the full posts data for detecting emotion bursts in the real public sphere. Secondly, we assumed that under the endogenous burst scenario, the rate of arrival is dependent on the average rate of comments over past ϕ period where ϕ is a constant value for the given time series segment. Scientific Reports | (2022) 12:3120 | https://doi.org/10.1038/s41598-022-07067-w www.nature.com/scientificreports/ This assumption is based on the intuition that users tend to read the latest comments over a certain period and then post a comment. However, the actual user behavior and the way that the comments are presented to users (for example, the most liked or interacted comments are promoted to the top) may be more dynamic and more complicated. Thirdly, we analyzed the six emotions independently, but not looking into the synergy between emotions. We are interested in exploring this topic as a future work.

Data availability
The datasets analyzed during the current study are not publicly available due to Weibo open API policy (keeping personal data confidential), but aggregated and anonymized data are available from the corresponding author on reasonable request. Similar data can be obtained using Weibo API (https:// open. weibo. com/ wiki/ API). Details are provided in in the Supplementary 1 (3. Data Collection Process).